AITopics | compact model

Collaborating Authors

compact model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Can Compact Language Models Search Like Agents? Distillation-Guided Policy Optimization for Preserving Agentic RAG Capabilities

Kotoge, Rikuto, Nishimura, Mai, Ma, Jiaxin

arXiv.org Artificial IntelligenceOct-14-2025

Reinforcement Learning has emerged as a dominant post-training approach to elicit agentic RAG behaviors such as search and planning from language models. Despite its success with larger models, applying RL to compact models (e.g., 0.5--1B parameters) presents unique challenges. The compact models exhibit poor initial performance, resulting in sparse rewards and unstable training. To overcome these difficulties, we propose Distillation-Guided Policy Optimization (DGPO), which employs cold-start initialization from teacher demonstrations and continuous teacher guidance during policy optimization. To understand how compact models preserve agentic behavior, we introduce Agentic RAG Capabilities (ARC), a fine-grained metric analyzing reasoning, search coordination, and response synthesis. Comprehensive experiments demonstrate that DGPO enables compact models to achieve sophisticated agentic search behaviors, even outperforming the larger teacher model in some cases. DGPO makes agentic RAG feasible in computing resource-constrained environments.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2508.20324

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (1.00)
Education (0.69)
Media > Music (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Embedding-Enhanced Probabilistic Modeling of Ferroelectric Field Effect Transistors (FeFETs)

Afee, Tasnia Nobi, Hutchins, Jack, Islam, Md Mazharul, Kampfe, Thomas, Aziz, Ahmedullah

arXiv.org Artificial IntelligenceAug-6-2025

FeFETs hold strong potential for advancing memory and logic technologies, but their inherent randomness arising from both operational cycling and fabrication variability poses significant challenges for accurate and reliable modeling. Capturing this variability is critical, as it enables designers to predict behavior, optimize performance, and ensure reliability and robustness against variations in manufacturing and operating conditions. Existing deterministic and machine learning-based compact models often fail to capture the full extent of this variability or lack the mathematical smoothness required for stable circuit-level integration. In this work, we present an enhanced probabilistic modeling framework for FeFETs that addresses these limitations. Building upon a Mixture Density Network (MDN) foundation, our approach integrates C-infinity continuous activation functions for smooth, stable learning and a device-specific embedding layer to capture intrinsic physical variability across devices. Sampling from the learned embedding distribution enables the generation of synthetic device instances for variability-aware simulation. With an R2 of 0.92, the model demonstrates high accuracy in capturing the variability of FeFET current behavior. Altogether, this framework provides a scalable, data-driven solution for modeling the full stochastic behavior of FeFETs and offers a strong foundation for future compact model development and circuit simulation integration.

artificial intelligence, machine learning, variability, (17 more...)

arXiv.org Artificial Intelligence

2508.02737

Country: North America > United States > Tennessee (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Compact Model for Large-Scale Time Series Forecasting

Yeh, Chin-Chia Michael, Fan, Xiran, Jiang, Zhimeng, Fan, Yujie, Chen, Huiyuan, Saini, Uday Singh, Lai, Vivian, Dai, Xin, Wang, Junpeng, Zhuang, Zhongfang, Wang, Liang, Zheng, Yan

arXiv.org Artificial IntelligenceFeb-27-2025

Spatio-temporal data, which commonly arise in real-world applications such as traffic monitoring, financial transactions, and ride-share demands, represent a special category of multivariate time series. They exhibit two distinct characteristics: high dimensionality and commensurability across spatial locations. These attributes call for computationally efficient modeling approaches and facilitate the use of univariate forecasting models in a channel-independent fashion. SparseTSF, a recently introduced competitive univariate forecasting model, harnesses periodicity to achieve compactness by concentrating on cross-period dynamics, thereby extending the Pareto frontier with respect to model size and predictive performance. Nonetheless, it underperforms on spatio-temporal data due to an inadequate capture of intra-period temporal dependencies. To address this shortcoming, we propose UltraSTF, which integrates a cross-period forecasting module with an ultra-compact shape bank component. Our model effectively detects recurring patterns in time series through the attention mechanism of the shape bank component, thereby strengthening its ability to learn intra-period dynamics. UltraSTF achieves state-of-the-art performance on the LargeST benchmark while employing fewer than 0.2% of the parameters required by the second-best approaches, thus further extending the Pareto frontier of existing methods.

dataset, forecasting, shape bank component, (13 more...)

arXiv.org Artificial Intelligence

2502.20634

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Data Science > Data Mining (0.84)

Add feedback

Dissertation Machine Learning in Materials Science -- A case study in Carbon Nanotube field effect transistors

Tan, Shulin

arXiv.org Artificial IntelligenceJan-18-2025

Carbon Nanotube has long been seen as a promising candidate for high-performance electronic material, yet its unique 1D structure leads to challenges in device fabrication. Many processing approaches have been proposed to produce better performing CNTFETs and this explosion of data needs an efficient way to explore.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.14813

Country: North America > United States > California (0.29)

Genre: Research Report > Experimental Study (0.45)

Industry:

Semiconductors & Electronics (1.00)
Materials (0.75)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Optimizing Performance: How Compact Models Match or Exceed GPT's Classification Capabilities through Fine-Tuning

Lefort, Baptiste, Benhamou, Eric, Ohana, Jean-Jacques, Saltiel, David, Guez, Beatrice

arXiv.org Artificial IntelligenceAug-22-2024

In this paper, we demonstrate that non-generative, small-sized models such as FinBERT and FinDRoBERTa, when fine-tuned, can outperform GPT-3.5 and GPT-4 models in zero-shot learning settings in sentiment analysis for financial news. These fine-tuned models show comparable results to GPT-3.5 when it is fine-tuned on the task of determining market sentiment from daily financial news summaries sourced from Bloomberg. To fine-tune and compare these models, we created a novel database, which assigns a market score to each piece of news without human interpretation bias, systematically identifying the mentioned companies and analyzing whether their stocks have gone up, down, or remained neutral. Furthermore, the paper shows that the assumptions of Condorcet's Jury Theorem do not hold suggesting that fine-tuned small models are not independent of the fine-tuned GPT models, indicating behavioural similarities. Lastly, the resulted fine-tuned models are made publicly available on HuggingFace, providing a resource for further research in financial sentiment analysis and text classification.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2409.11408

Country:

Asia (0.68)
North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Industry:

Banking & Finance > Trading (1.00)
Energy > Oil & Gas > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Compact Model Parameter Extraction via Derivative-Free Optimization

Martinez, Rafael Perez, Iwamoto, Masaya, Woo, Kelly, Bian, Zhengliang, Tinti, Roberto, Boyd, Stephen, Chowdhury, Srabanti

arXiv.org Artificial IntelligenceJun-24-2024

In this paper, we address the problem of compact model parameter extraction to simultaneously extract tens of parameters via derivative-free optimization. Traditionally, parameter extraction is performed manually by dividing the complete set of parameters into smaller subsets, each targeting different operational regions of the device, a process that can take several days or even weeks. Our approach streamlines this process by employing derivative-free optimization to identify a good parameter set that best fits the compact model without performing an exhaustive number of simulations. We further enhance the optimization process to address critical issues in device modeling by carefully choosing a loss function that evaluates model performance consistently across varying magnitudes by focusing on relative errors (as opposed to absolute errors), prioritizing accuracy in key operational regions of the device above a certain threshold, and reducing sensitivity to outliers. Furthermore, we utilize the concept of train-test split to assess the model fit and avoid overfitting. This is done by fitting 80% of the data and testing the model efficacy with the remaining 20%. We demonstrate the effectiveness of our methodology by successfully modeling two semiconductor devices: a diamond Schottky diode and a GaN-on-SiC HEMT, with the latter involving the ASM-HEMT DC model, which requires simultaneously extracting 35 model parameters to fit the model to the measured data. These examples demonstrate the effectiveness of our approach and showcase the practical benefits of derivative-free optimization in device modeling.

characteristic, loss function, model parameter, (11 more...)

arXiv.org Artificial Intelligence

2406.16355

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Sonoma County > Santa Rosa (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Semiconductors & Electronics (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Fast System Technology Co-Optimization Framework for Emerging Technology Based on Graph Neural Networks

Ma, Tianliang, Fan, Guangxi, Sun, Xuguang, Deng, Zhihui, Low, Kainlu, Shao, Leilai

arXiv.org Artificial IntelligenceApr-10-2024

This paper proposes a fast system technology co-optimization (STCO) framework that optimizes power, performance, and area (PPA) for next-generation IC design, addressing the challenges and opportunities presented by novel materials and device architectures. We focus on accelerating the technology level of STCO using AI techniques, by employing graph neural network (GNN)-based approaches for both TCAD simulation and cell library characterization, which are interconnected through a unified compact model, collectively achieving over a 100X speedup over traditional methods. These advancements enable comprehensive STCO iterations with runtime speedups ranging from 1.9X to 14.1X and supports both emerging and traditional technologies.

emerging technology, fast system technology co-optimization framework, graph neural network, (12 more...)

arXiv.org Artificial Intelligence

2404.06939

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.83)

Industry: Semiconductors & Electronics (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models

Kang, Junmo, Xu, Wei, Ritter, Alan

arXiv.org Artificial IntelligenceJul-5-2023

Fine-tuning large models is highly effective, however, inference can be expensive and produces carbon emissions. Knowledge distillation has been shown to be a practical solution to reduce inference costs, but the distillation process itself requires significant computational resources. Rather than buying or renting GPUs to fine-tune, then distill a large model, an NLP practitioner might instead choose to allocate the available budget to hire annotators and manually label additional fine-tuning data. In this paper, we investigate how to most efficiently use a fixed budget to build a compact model. Through extensive experiments on six diverse tasks, we show that distilling from T5-XXL (11B) to T5-Small (60M) is almost always a cost-efficient strategy compared to annotating more data to directly train a compact model (T5-Small). We further investigate how the optimal budget allocated towards computation varies across scenarios. We will make our code, datasets, annotation cost estimates, and baseline models available as a benchmark to support further work on cost-efficient training of compact models.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.01645

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.68)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference

Zhou, Wangchunshu, Bras, Ronan Le, Choi, Yejin

arXiv.org Artificial IntelligenceJun-4-2023

Pre-trained Transformer models like T5 and BART have advanced the state of the art on a wide range of text generation tasks. Compressing these models into smaller ones has become critically important for practical use. Common neural network compression techniques such as knowledge distillation or quantization are limited to static compression where the compression ratio is fixed. In this paper, we introduce Modular Transformers, a modularized encoder-decoder framework for flexible sequence-to-sequence model compression. Modular Transformers train modularized layers that have the same function of two or more consecutive layers in the original model via module replacing and knowledge distillation. After training, the modularized layers can be flexibly assembled into sequence-to-sequence models that meet different performance-efficiency trade-offs. Experimental results show that after a single training phase, by simply varying the assembling strategy, Modular Transformers can achieve flexible compression ratios from 1.1x to 6x with little to moderate relative performance drop.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.02379

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas (0.04)
North America > Dominican Republic (0.04)
(16 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

On the Effectiveness of Compact Biomedical Transformers

Rohanian, Omid, Nouriborji, Mohammadmahdi, Kouchaki, Samaneh, Clifton, David A.

arXiv.org Artificial IntelligenceSep-7-2022

Language models pre-trained on biomedical corpora, such as BioBERT, have recently shown promising results on downstream biomedical tasks. Many existing pre-trained models, on the other hand, are resource-intensive and computationally heavy owing to factors such as embedding size, hidden dimension, and number of layers. The natural language processing (NLP) community has developed numerous strategies to compress these models utilising techniques such as pruning, quantisation, and knowledge distillation, resulting in models that are considerably faster, smaller, and subsequently easier to use in practice. By the same token, in this paper we introduce six lightweight models, namely, BioDistilBERT, BioTiny-BERT, BioMobileBERT, DistilBioBERT, TinyBioBERT, and CompactBioBERT which are obtained either by knowledge distillation from a biomedical teacher or continual learning on the Pubmed dataset via the Masked Language Modelling (MLM) objective. We evaluate all of our models on three biomedical tasks and compare them with BioBERT-v1.1 to create efficient lightweight models that perform on par with their larger counterparts. All the models will be publicly available on our Huggingface profile athttps://huggingface.co/nlpie

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2209.03182

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
(4 more...)

Genre:

Overview (0.93)
Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback